INTRO TO BUSINESS PROBLEM

Why the Regork CEO should be interested?

1. Milk (Fluid-Milk White) is the highest sold item in the grocery department. Thus, analysis on milk provides huge possibilities in increasing the revenue as a dedicated demand is already present.

2. Due to its essential nature, milk plays a pivotal role in determining the layout and organization of store shelves.

3. The positive correlation between milk and other essential goods indicates a tendency for customers to purchase them in conjunction, highlighting potential marketing and sales opportunities

Methodology followed:

Strategic Framework for Augmenting Milk Sales Revenue

1. Basket Analysis:

Conduct a meticulous assessment of products frequently purchased alongside milk.Implement tailored advertising campaigns focusing on grouped product promotions.

2. Income Groups:

Target marketing endeavors towards income groups that exert the most significant influence on milk sales.

Customize campaigns to resonate with the preferences and purchasing behaviors of these demographics.

3. Age Group Analysis:

Thoroughly analyze sales data to identify the age group displaying the highest propensity for milk purchases.

Develop targeted strategies to engage and capture the attention of this specific age demographic.

4. Household Size:

Devote resources and efforts towards effectively penetrating and catering to these identified household size segments.

5. Weekly Trend Analysis:

Implement pre-planned promotional activities, including the timely distribution of coupons, to capitalize on these identified peak periods.

LOADED PACKAGES:

suppressWarnings(suppressMessages(library(completejourney)))
suppressWarnings(suppressMessages(library(tidyverse)))
suppressWarnings(suppressMessages(library(dplyr)))
suppressWarnings(suppressMessages(library(ggplot2)))

DATA PREPARATION:

For the Data Preparation we are using Complete journey data especially focussing on Transactions, Products, Demographics, Coupons datasets to generate trend analysis, patterns for further data deliberation and interpretation.

x <- get_transactions()
x
p <- products
p
coup <- coupons
coup
coup_red <- coupon_redemptions
coup_red
dem <- demographics
dem

EXPLORATORY DATA ANALYSIS:

Reason to Opt for product based analysis of Fluid Milk White Only:

For initial data interpretation we have joined the transactions with product tables, in order to bring out product_id or product_type with highest sales with respect to department. With aim to generate a specific product based analysis we intended to go with product ‘FLUID MILK WHITE ONLY’ as it gives highest sales recorded within grocery department as well as it gives a positive correlation, the possibility of customer buying the milk product alongside other staple products like eggsand bananas. Here is a plot to illustrate the afformentioned logic.

tran_prod <- x %>%
  left_join(p, by = 'product_id')
tran_prod
tran_by_product <- tran_prod %>%
  group_by(product_id, product_type,department) %>%
  summarise(total_sales = sum(sales_value), .groups = 'drop') %>%
  arrange(desc(total_sales))
tran_by_product
filt_data3<-filter(tran_prod, product_id==1029743)  # milk
filt_prod_2<-filter(tran_prod, product_id==1082185) # bananas
filt_prod_3<-filter(tran_prod, product_id==981760)  # eggs
only_milk<-filt_data3 %>%
  left_join(demographics, by = "household_id") # gettings ages and incomes

only_milk_noNA <- na.omit(only_milk)

milk_baskets <- filt_data3$basket_id

# Find the total number of basket IDs in filt_data3
baskets_with_milk_only <- n_distinct(filt_data3$basket_id)

# Find baskets that contain both milk and bananas (product_id == 1082185)
bananas_with_milk <- n_distinct(filter(filt_prod_2, basket_id %in% milk_baskets)$basket_id)

# Find baskets that contain milk, bananas, and eggs (product_id == 981760)
eggs_with_milk_and_bananas <- n_distinct(filter(filt_prod_3, basket_id %in% milk_baskets)$basket_id)

# Print the results
cat("Baskets buying Milk Only:", baskets_with_milk_only, "\n")
## Baskets buying Milk Only: 7874
cat("Baskets buying Milk and Bananas:", bananas_with_milk, "\n")
## Baskets buying Milk and Bananas: 1909
cat("Baskets buiying Milk, Bananas, and Eggs:", eggs_with_milk_and_bananas, "\n")
## Baskets buiying Milk, Bananas, and Eggs: 812
basket_data <- data.frame(
  Category = c("Baskets with Milk and Bananas", "Baskets with Milk, Bananas, and Eggs", "Baskets with Milk Only"),
  Count = c(bananas_with_milk, eggs_with_milk_and_bananas, baskets_with_milk_only)
)

# Calculate the share
basket_data$Share <- basket_data$Count / sum(basket_data$Count)

# Create a pie chart
pie_chart <- ggplot(basket_data, aes(x = "", y = Share, fill = Category)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar(theta = "y") +
  labs(title = "Basket Composition") +
  theme_void() +
  scale_fill_brewer(palette = "Set3") +
  geom_text(aes(label = scales::percent(Share)), position = position_stack(vjust = 0.5))

# Display the pie chart
print(pie_chart)

Total Sales Value by Income Range

This plot allows you to see how sales of ‘Fluid Milk White Only’ are distributed across different income ranges, making it useful for understanding the product’s performance in relation to income levels, Interestingly we can observe 50-74k income group contributing highest to overall product sales.

Note: its important to comprehend that the product being a staple product has seen almost no discount given over the 52 week period, which might have contributed to the saturation in its market sales as can be observed from below plot.

ggplot(only_milk_noNA, aes(x = income, y = sales_value, fill = income)) +
  geom_bar(stat = "identity") +
  labs(title = "Sales Value by Income Range (Fluid Milk White Only)",
       x = "Income Range",
       y = "Sales Value") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) 

Total Spending by Age Group

Through this plot we can infer that 45-54 and 35-44 age groups have higher sales with respect to milk.

filt_data3 <- filter(tran_prod, product_id == 1029743)  # Fluid Milk White Only
only_milk <- filt_data3 %>%
  left_join(demographics, by = "household_id") # getting ages and incomes

only_milk_noNA <- na.omit(only_milk)

age_group_spending <- only_milk_noNA %>%
  group_by(age) %>%
  summarise(total_spending = sum(sales_value)) %>%
  arrange(desc(total_spending))
age_group_spending
ggplot(age_group_spending, aes(x = age, y = total_spending, fill = factor(age))) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Total Spending by Age Group (Fluid Milk White Only)",
       x = "Age Group",
       y = "Total Sales Value") +
  scale_fill_brewer(palette = "Set3") +  # Choose a color palette (you can change "Set3" to any other valid palette)
  theme_minimal()  # Apply a minimal theme for cleaner look

Total Sales Value by Income Range for 45-54 age group

Now we analyze the age groups of 35-44 and 45-54, by total product sales contribution to analyse sales patterns of customers by their specific income ranges. since both these groups present a prospectus of further growth in sales revenue by targeted sales campaign.

Now dissect 45-54 Age group further by Income ranges, 50-74 income range contributes to highest sales while considerable push is needed to increase sales from 75-99K income group.

age_group_45_54 <- only_milk_noNA %>%
  filter(age == "45-54")
income_range_spending <- age_group_45_54 %>%
  group_by(income) %>%
  summarise(total_spending = sum(sales_value)) %>%
  arrange(desc(total_spending))
print(income_range_spending)
## # A tibble: 11 × 2
##    income    total_spending
##    <ord>              <dbl>
##  1 50-74K             887. 
##  2 75-99K             510. 
##  3 125-149K           333. 
##  4 35-49K             284. 
##  5 25-34K             273. 
##  6 Under 15K          176. 
##  7 15-24K             170. 
##  8 150-174K           108  
##  9 100-124K            98.7
## 10 250K+               56.7
## 11 175-199K            10.8
ggplot(age_group_45_54, aes(x = income, y = sales_value, fill = income)) +
  geom_bar(stat = "identity") +
  labs(title = "Total Sales Value by Income Range for 45-54 age group",
       x = "Income Range",
       y = "Total Sales Value") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Total Sales Value by Income Range for 35-44 Age Group

age_group_35_44 <- only_milk_noNA %>%
  filter(age == "35-44")

income_range_spending <- age_group_35_44 %>%
  group_by(income) %>%
  summarise(total_spending = sum(sales_value)) %>%
  arrange(desc(total_spending))
print(income_range_spending)
## # A tibble: 12 × 2
##    income    total_spending
##    <ord>              <dbl>
##  1 50-74K             597. 
##  2 35-49K             564. 
##  3 75-99K             437. 
##  4 150-174K           261. 
##  5 125-149K           214. 
##  6 Under 15K          162. 
##  7 15-24K             157. 
##  8 25-34K              73.0
##  9 100-124K            51.9
## 10 200-249K            29.8
## 11 175-199K            29.4
## 12 250K+               28.7
ggplot(age_group_35_44, aes(x = income, y = sales_value, fill = income)) +
  geom_bar(stat = "identity") +
  labs(title = "Total Sales Value by Income Range for 35-44 Age Group",
       x = "Income Range",
       y = "Total Sales Value") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Significantly high sales are consistently recorded for 35-44 age group across multiple income brackets.

Therefore, directing campaigns towards this demographic is crucial for the increased sales of milk.

SO the campaigns have to be focused on the 35-44 age group.

Sales Value by House Hold SIze

This plot allows you to see how sales of ‘Fluid Milk White Only’ are distributed across different household sizes, making it useful for understanding the product’s performance in relation to household size. We can infer that household size of 2 clearly provides highest sales. A targeted sales campaign towards households of size 2 presents a logical choice to boost overall product sales.

ggplot(only_milk_noNA, aes(x = household_size, y = sales_value, fill = household_size)) +
  geom_bar(stat = "identity") +
  labs(title = "Sales Value by Household Size (Fluid Milk White Only)",
       x = "Household size",
       y = "Sales Value") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Weekly Sales of Product

This plot allows you to visualize the weekly sales trend for ‘Fluid Milk White Only’ and easily identify the top 3 sales peaks, which could be crucial for understanding sales patterns and making strategic decisions.

A similar over all trend can be observed all over the 52 week, but we can also observe a significant hikes in sales when customer purchases coincide with holiday seasons.

weekly_sales <- only_milk_noNA %>%
  group_by(week) %>%
  summarise(total_sales = sum(sales_value))
p <- ggplot(weekly_sales, aes(x = factor(week), y = total_sales)) +
  geom_line(aes(group = 1), size = 1.5) +
  labs(title = "Weekly Sales Trend for Fluid Milk White Only",
       x = "Week",
       y = "Total Sales Value") +
  scale_x_discrete(breaks = seq(min(weekly_sales$week), max(weekly_sales$week), by = 4))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Identify the top 3 peaks
top_peaks <- weekly_sales %>%
  arrange(desc(total_sales)) %>%
  head(3)

# Add points for the top 3 peaks
p + geom_point(data = top_peaks, aes(x = factor(week), y = total_sales), color = "red", size = 3) +
  geom_text(data = top_peaks, aes(x = factor(week), y = total_sales, label = week), 
            vjust = -0.5, hjust = 0.5, color = "red", size = 3)

During weeks 28, 49, and 53, there is a notable surge in milk sales.

This phenomenon is attributed to the onset of summer vacation, the holiday season, and the commencement of the new year.

It is imperative to distribute campaigns and coupons prior to these weeks to maximize their impact during these peak sales periods

SUMMARY

Business Question:

Goal is to analyze Regork’s customer behavior and product data to augment product type sales specific to ‘Fluid Milk White Only’ within the Grocery department and learn how to effectively engage our customers through targeted campaigns to generate sustainable traffic and sales.

How you addressed this problem statement:

By analyzing consumer behavior patterns across different parameters: Basket Analysis, Age-Group, Income levels, weekly trend analysis, household size, we were able to identify trends.

Interesting Insights found:

Based on primary observation we found consistency in 3 product categories irrespective of seasonal tends, these products being REG UNLEADED FUEL, MILK, SOFT DRINKS. We deep dived into product based analysis we found ‘Fluid Milk White Only’ contributing heighest sales in Grocery Department

During weeks 28, 49, and 53, there is a notable surge in milk sales. It is due to onset of summer vacation and onset of holiday seaspn.

Another interesting observation is that there are almost no coupon discounts provided for the product through out the year, signifying the

Analysis and Recommendations:

1.Based on above data analysis, a sales strategy can be devised besed on both income groups and age:

Income-Based Analysis (50-74k):

The 50-74k income bracket consistently demonstrates the highest demand for milk across all age groups. A targeted marketing campaign tailored to the preferences and habits of this demographic, emphasizing quality, health benefits, and convenience, is advised. Additionally, implementing promotions and discounts, such as bundle deals or loyalty programs, can further tap into the potential of this income group.

Age-Based Analysis (35-44):

Within the 35-44 age group, there is a consistent demand for milk, regardless of income level. This suggests that factors beyond earnings play a significant role in consumption habits. To capitalize on this, a campaign highlighting the nutritional benefits of milk and its adaptability in various daily routines is recommended. Emphasizing the convenience of incorporating milk into common meals and snacks can resonate with this demographic. Furthermore, conducting consumer surveys or focus groups within this age group can provide deeper insights into their preferences and needs, allowing for more targeted marketing efforts.

2.There exists a seasonal variation in trend of milk sales as can be seen in line graph, especially in week 27(June 26th- start of summer vacation), week 45 to 49 (OCT 29th - NOV 26th Holiday Season - Haloween, ThnaksGiving, winter break preparations), even weather influences a rush in Milk purchases before hand. Providing a coupule of coupons as part of strategic campaign could witness boost in sales.

Basket analysis helped us to identify the products that are purchased along with milk. Thus, we propose campaigns that encompass milk, bananas and eggs.